Moderator: Good evening, I was Sohu IT channel Du Peng, am glad to have so many friends came to our site, we are honored to have the technical, general manager of CIC Yumin Wei (Denis), Denis drive from Shanghai today over, the plane arrived at the scene of Sharon Church of our marketing. Here we have invited us to share CIC Denis reputation in the interactive research network of some results, there you Denis.
CIC Technology Division General Manager Mr. Yu Minwei guest marketing contribution to network church
users today, word of mouth marketing creativity
very happy to see so many friends to join our church Sohu IT marketing activities. Sohu also very grateful to our salon in this Church to organize marketing platform for regular exchange of people This experience with the knowledge. I think of you here today are our top marketing or public relations experts, in fact, I believe that our case to the customer service which may be now more and more access to the network of word of mouth This is one thing.
fact good content marketing success stories associated with you is very big, how do you know this content is the target audience favorite. including Shantie, they need to understand the market, consumers discussed the contents of the judgments. I had also met a friend, he told me that they received a word of mouth marketing case, but also contributed to dig through the content network of word of mouth marketing to discuss the contents. But there is an element of planning which was In short, these words I do not say, ordinary people seem to be normal in the case, but the audience has its own unique culture, he made this remark as an offensive, then led to some misunderstandings. < br> In my own personal opinion, Ye Hao hydrated paste, or Shantie Ye Hao is a relatively extensive network of word of mouth marketing, in fact, not a real word of mouth marketing network, or belong to more one-way, controlled, and did not maximize the value of word of mouth network play, this value lies in the br> The second Internet users not only help you spread this information will contribute to a lot of creative content out. We do this before the plan to help L'Oreal, one of them has a case that there was a fan of writing it a poem, is entirely spontaneous, in his love L'Oreal products. This poem on the Internet in fact be a very widely spread, is actually a good idea to tell by our analysis of L'Oreal, them according to plan such a way an event was a big success.
users not just to help you spread dissemination of the contents you want him, he will give you all kinds of creative contributions, and may their own interests with the users is very good combination.
guests arrived at the scene heard a very focused
how word of mouth information gathering network?
word of mouth can learn through this cultural trends on the network, so this information planning of marketing activities, or assess the results of your marketing activities when it is very important. We use about what Google or Baidu search, but their customer-facing search engine, or the general Internet users, will not tell you to search a How much is it considered something positive and negative what percentage would need other technical help, be able to do your analysis deeper.
first talk about the data collection, which is the network analysis is an important part of word of mouth , we often cite as an example, data like oil, cars behind you and then advanced, as long as the oil, the oil is no way you do not move, the entire analysis of the data equivalent of a raw material, have a more comprehensive, integrated to make this data support system in the inside back.
market has a lot of technology platform or technology tools, there are many data collection methods, we here give a few examples of the more common.
the first web-based data collection keep abreast of children's collection, may be the search engine, and now, as some Comprehensive search engine is such a Web-based collection, is a one page, but in fact, inside or BBS if you blog, a page which may have many post, or many of the themes to see if simply as an article is obviously fairly thick, and there is no way back behind the breakdown of support. So this one if there is a split based on post to this page, can be more in-depth results of the analysis.
often, we see the second case, some analysis of the system of data collection based on keywords, I am concerned about the KFC thing recently, I set this keyword in, this way you can see the keywords associated with this discussion, and all subsequent analysis.
There is also a collection for a particular method is an industry which can put a representative of the industry gather together all the site data, which can in this do a comprehensive analysis of comprehensive. For example the automobile industry, domestic with some of the more professional leadership of social media, if we can do on a comprehensive analysis of these data. There must be an analysis of the sample behind you to support an analysis . It is also a point of difference.
There is also that, if we assume that post based on the collection, there is a problem you need to pay attention to this collection, such as a BBS with the above have posted a reply to the first concept, but it also has its value . For example, we often analysis of Internet users if you do, you can see the contents of human hair which caused the largest number of Replies, causing the largest number of people is, if it is first posted, then it is very valuable information .
is the last piece of data quality control, China is in fact a feature of the situation in foreign countries we know some of the more standard APR, able to open their data, foreign social media is used in the data above the open structure. In this sense China is not particularly strong, especially for the BBS or the media like a blog, you may change the query procedures for the site have some bias, requires a relatively complete data control system to ensure you find the automobile industry this month fell by 50% the amount of discussion, in the end is the actual amount or 50% of the discussion, or a site of a large amount of change in the failure of your discussion.
second point is a natural source of analysis, this little piece of technology is relatively a high place, because many Chinese Internet users said earlier, the amount of data generated each month is very large, you light the data they collected it to your customers, the number of posts this month, car, how many discussions, which of the value is still very high, the customer is concerned about the number of people inside the post on my brand, how many people to discuss my boutique, how many people, when talking about cars, we are all concerned about what aspects of car a little more? concerned about the engine and more point? or a little more concerned about the interior, including how the user feeling like, it is also very important.
this matter can be resolved not just rely on one thing, this data does not depend on others to do a little bit to see a label This is the talk about the brand, what product, what attributes, the user what kind of emotion, so the data is too great, so natural source of technology for technology applied to this area which, by the very fast computers to help people put off reading these posts .
for instance, is not to mark out this post talking about what products and what attributes, but users often post not use standard language to talk, he may just be the user's language entirely their own to do this thing, And in the article referred to a number of them may also compare product attributes, this method requires a computer breakdown. For example, technology is not through a number of grammatical analysis is not based on the sentence level through the analysis to this one information marked out very fine, this is a very critical point. If some of the lower half of the syntax-based text mining method, you can see inside the key words related to a very precise, in the end is for something to? example, this example , see the , the keyboard is one of the following for the N71 product attributes, easy to use the entire keyboard, each article there is a very complex semantic associations. For the subsequent analysis is also very important, and if your identity was not allowed to get out of the analysis The results based on incorrect analysis to determine the ultimate impact you.
Another point, for example, in talking about some of on-line analysis of each industry are very complex, clients often need not only to know his product, but also need to know about his competitive products, is not the system which will help customers in all of these industries in which the information indicates, for the customer is concerned about the Nokia is not configured to set a keyword, we all know about the Nokia Internet users must There are many ways to talk about the Nokia standard Chinese, the English, there are various references in it, this thing is not to help users more comprehensive coverage, they are concerned about the brand or product, which is very important point. Otherwise, users themselves set the keywords of his release if only Nokia or English, he could lose 30% of the discussion, are not included in your analysis system inside. This is a definition of information about the industry.
CIC how collected data were analyzed natural language analysis is over
produce very large data. So you came out of these data, in fact, is to end the data to our analysts side or client side, enabling him to facilitate query . to do the technical side more, then maybe you know when there are very complex database of information in there, you benefit by direct query is very low, in order to enable the analyst or the user quick analysis of these results, need to use Some data warehouses, to the back of these data to support fast queries.
example, our system here, which you will see a screen. you can say I have to check what the product attributes discussed, the number of posts , how many people in the discussion, you can find the results very quickly.
there is a miniature of the original information technology, I believe we get some analysis of the data, when you see this month to discuss the amount of this product a million, next question is to know the million messages talking about it in the end? want to see the original content. It seems that everyone is in a very simple thing, but if you have a large amount of data inside , the performance will be a very significant impact, we need a very good engine in it the raw data to support this through the index to drill a function of the original data, which you may also need to conduct a further search, I saw cars discussion Engine volume suddenly surged this month, I went in search after what may, in talking about the engine or V4 V6 engine, I have to do some searching. Suppose you use an engine in time, you want to get the original information from this platform to look at features may need to know about, is not such a microcosm of the technology on the inside, to the time when you get the tool, not just see what indicator, you can also quickly see the inside of the original information.
such that Edge cited an example, if you see this indicator which some of the more interesting data, mining it to the original data, it is easy to switch to the back of this indicator which the original information in the end?
just said that in fact these things is the technology node, but all of the technology will be a lot of storage with the calculation of loss in it. now a popular term in which is very important, otherwise even if you have a good data collection techniques, even if you have a good natural source technology, which you will find that when I can not get this data, because computation is too great, But if you want to make some of the larger tools, or large-scale analysis, a technique which will highlight its importance, generally speaking, this technology is one of a distributed computation, through some mechanism, some algorithms, including some of the layout of the computer, the system can easily expand a processing capacity and storage capacity.
calculated over all data, and ultimately still have to show people we are talking about, you exist in the database which the data is useless The key is for analysts to see, or to the customer to see. so the final involves a data visualization technology, a lot of data can be used to list a list of the most simple, but sometimes people are far from the sensitivity of the data graphic sensitive.
CIC began in 2004 when the technology to do this piece, but also began to develop its own technology, we in 2004, when the depth of web-based data collection engine, also made 05 key-based breadth of the word acquisition, in 2006 we developed their natural source of analysis, 06 to the back of the continued strengthening of our data storage and computing, data visualization and development of the above inputs, we have accumulated a lot of technology platform, so basically we now have a new platform, the above also shows some of our newer products, some of the technologies in which the results of research and development.
I think basically that is the case, I do not know what problem, in fact, we Sohu The event was well attended guests asked questions
: You just said the forum has a collection site, an analysis of the data, collected from your side of the site including forums, you can choose what criteria to consider these forums or BBS, which can introduces what?
Yumin Wei: because there is a principle that we need to make sure, your time in the analysis because the scope of China's network is very large, the first thing is not possible to collect all the data all together, we often Customer told me that he can take this a second search over the Internet in China, all of the data can get, this argument is essentially unrealistic.
before we have done research in this area, we has also been a discovery, many industries in China, and its discussion of the principles to follow twenty-eight, 80% of the discussions occurred in 20% of the sites which we collect the above data is not in pursuit of 100% of the data collection scope, we can through some method do some research in advance, choose which are more representative of the site, this is a little similar to the traditional analysis of sampling theory, not a hundred percent coverage of the pursuit, hoping to find a representative, as your sources of data samples collected, this area is because of technical considerations above, and based on this method can be valuable information to customers.
guests question: The original name of this post is not followed, and now there may be some names, and for sources of information The ID is accurate analysis can be done?
Yumin Wei: I think most social media, regardless of all the content behind the name of this thing is real or an IP, will be a logo, identity, said that at least one Who sent the information, and we also hope that analysis to the information layer, as this which also involves the problem of online privacy protection, not to say that you find better ways to pull out the names of real people, then you get behind the how, the Among Internet users to care about his privacy will not be protected, but speaking from our experience, at least most of the information, the registered IP or some nickname can get. In addition to can not get is the search for a data engine.
For example, if assume that the brand is to find a voice, it wants to see the spokesman for the network discussed above, how to word of mouth? or Pepsi, you can not find 80% or 20% of the forum for discussion of Pepsi every day, do not would, this time involves the use of search engines you have to compare a wide range of such a platform that can get you some information in this regard. But search engines are unable to get people's information in the current situation, so there which involves the time the authenticity of the data, reach behind you the depth of analysis required, in which case, this is a more realistic situation, you can only adjust the level of the back of an analysis, for example, we give our customers to do analysis, if done involves the analysis of search engines may not like our depth of analysis for the BBS, or the analysis of micro-Bo so deep.
guests question: I have two questions, the first question you just talked about, but I still want to know a little more. such as a car or for IT products, we do have a dedicated forum to discuss these products, you can search for posts with the method to collect data, such as Coca-Cola This beverage products, hard to concentrate in a few forums there, you are in the methodology behind how the data collected in this area?
Yumin Wei: in fact, also mentioned earlier that does have such a situation arising, this thing is different industry the situation is different, for the analysis of demand for different products are not the same, according to our present experience, because you created this limit the data to you, does not mean there is no point value of this data, the discussion if it is more dispersed, with no 20% of the site to find way to read 80% of the discussion, so that method in the present circumstances we do by way of search engines, not simply relying on the top of search engines to go directly to fill the keyword into the search again, in fact, inside the data quality search engine from the analysis point of view, involves several problems.
The first is the author of incomplete information, the search engine which is difficult to get the real information, you are not hard to identify the number of people involved in this discussion, or reply to this message .
second time allowed, such as search engines like Google and Baidu is a snapshot of time, more time, customers have a real point of concern is the time of this article is how much the above will cause another problem of data quality .
third problem for search engines, it gives the content, if you go to the search, the results of which are part of a an excerpt, not the full story, resulting in the content above this limit , due to restrictions caused by the three behind you some very detailed analysis is no way to do it, for example just mentioned, if I'm going to measure the activity in the whole process, what the impact of Internet users is very high, in accordance with our practice BBS , we first look at the number of Internet users have their own post, he caused much discussion, many people with him, this method of information using search engines have no way to do.
you encounter such a situation, your analysis is no way to like you BBS so deep to do, but this thing because data limitations caused by the impact search engines at least through this way you can get an approximate amount of, for example, this post you discussed above, the amount of the network number? and then he different media above the spread of the how? this can get, because each result has a website, you can see today in dozens of sites, a look tomorrow to go above twenty sites, compared with what can propagation path to see what he looks like.
and once you found it among them some of the more point worthy of further study and found that the site discussed above, it is more representative of the content, or content of the site more useful to discuss the above You can combine these two ways, first by the breadth of a search engine, find some point you want to dig deeper to find the point where after the relatively fixed for this forum to make a thorough analysis, we adopted this settlement.
guests question: The second question, you just tell the user from all angles, good and bad reviews of Nokia, the data acquisition and analysis is done manually or a machine to do?
Yumin Wei: We have cars that side of the daily collection of data about fourteen hundred million per the original information, so if people read over this fourteen hundred foolproof This is a very heavy workload, this is certainly done by the machine.
But this machine is not say that the ability of the computer really like people have been super to a complete thought, require some manual intervention on the inside, but not to say that people go and by the Nokia machine emotional product attributes, that people give some guidance computer, which is a combination of machine, all of our analysis, machine take most of the work, some of our work in which human, representing the adjustment of the machine is constantly done to determine the logic to determine these basis with the judge to ensure the machine to form a more accurate a result.
Guest question: Since it is a machine to do, such as posts like this, this morning I drank a can of Coke to go to work, The relationship with the coke that is not, how did you do this one?
Yumin Wei: I actually judge for themselves, as just such a post, in addition to the reference rate contribution, the other does not have much contribution, while we are talking about According to the text dig, man-machine with a process, there is a very important thing, that thing may be too technical point, we call text mining, is a taxonomy, an example, the same site to see the car, you have a different view , for example, some sports brands to help us do a market research and analysis of network reputation, the same ten sites, I would like to see a discussion of the team, discuss how the user can see stars, but also to discuss how the user can see the product from different point of view, there is no way to predict for the computer to the final terms of what your point of view this data, the methods which people use to tell the computer what I want to see what angle? which a medium, that is, a good taxonomy of artificial things first point, I I want to see the team information to tell the computer to set up some key information, tell the natural source analysis, you can follow the classification of segmentation.
example, the example you just said, if you want to study to work with Coke relationship can be analyzed from this perspective, if this thing just say Coke, it may be the product system for the analysis, this still depends on the user needs analysis and user analysis to determine the angle of the computer to know how to point it to help you this thing.
guests question: You mentioned just now to see the text for the BBS data, we see that many are based on a text indexed collection of information, for pictures and video information can now be accurately capture and collect you?
Yumin Wei: have not yet saved, which in itself involves a more advanced a technology, we see some companies in foreign countries to do this work, the core of this piece of work or belong to a relatively early stage, you may be able to see through me, so now Google has technology that can have an image search, you take a photo up, it tells you what it is, such an analysis to support more complex now on word of mouth analysis more difficult.
say you analyze it in the discussion of what products and what attributes, to find this in the image correlation, above the current technology is still relatively difficult, so we are now the core of the main point of analysis techniques , but it is still mainly based on analysis of such a text in it. But I believe that as the core technology, in fact, that one day things will be integrated into the Word of Mouth word of mouth within the analysis of such a category to the. < br> Guest question: I would like to ask, because I am here to help customers to do a spread of the virus, such as the video is, we set a title, so the video and pictures after the node can not find the spread, I would like to ask Do you have any good way to dig these nodes, for example, some information in the SNS point of view, in the transmission process, the data some grasp of these issues is there a good way?
Yumin Wei : Just now you said about the spread in the picture or video distribution, when some text there is no way to track changes resulted, and now the technology really hard to find to, this may require some testing platform can do some of the more generic detection, tell you not say so exactly the video can be transmitted to another place, if its text has changed, but if you are put into some of the media inside the scope of your testing, if the assumptions changed a title that contains your original keyword , but this title also fired the very fire, was found on the opportunity is still there, once you discover what the future may put this back into the scope of your test track inside.
Overall text labels, in fact, according to the current technology that, for you to track the spread of effect is very important to a piece of information, once this information is changed after some of the effects may affect the test. This one may have to wait this image analysis or video analysis technology matures, this one testing needs may be addressed.
you just said the second issue, SNS there are a lot of data, I believe in doing any analysis of the time, most of the data is a third-party data, involves a willing to open and do not want to open things, if this were not willing to open this data, you must go to analyze this is also a problem, in which case our approach, we usually hope to find some of the cooperation between the media with these can It reached here from some internal data exchange, which is the premise in the protection of user privacy, there is a cooperation with them by providing them with data that can help them from our analysis and the results go back to the client side.
Guest question: In the data collection process, the people how to guide the machine tool?
Yumin Wei: their data is based on the previous talk, in fact, you choose the time of the target site is a machine to do. For example twenty-eight principle 20% of the site you choose took 80% of the data, it is not one hundred percent, and then once you choose which sites, but after, there is no machine work, because this is actually just said that the need for a mass data storage and processing capacity The reasons for this, because you followed the standard in the machine involved is very difficult to develop, because you develop a good, or may have a lot of valuable information left out, we usually include all the information back to our analysis system.
intervention when in fact there are two entry points, the first point of intervention is to tell the source of this natural system, what angle you analyze the data from the second intervention point, you just mentioned to this demand, we have the data analysis platform, the whole process, in fact people do not participate, but the final results of your analysis, such as a look this month to discuss the product of very many people at this time depends on the specific some of the content, this time with the first look to see there is a big difference, this time the computer is worth looking at to help you put your content to help you pick out, the content may be thousands or hundreds, the similarity information together, not to say that people who have not read the data, which is unrealistic, and ultimately to see the machine just to help you narrow the scope, but the qualitative conclusions of the final judgments or artificial.
guests questions: positive or negative information how the analysis of this out?
Yumin Wei: The machine can help you do these things, but you have some reference to a machine early. But a reference is like this, one of the most simple way, or with respect to the technical method requires relatively low, you tell the machine involves the analysis of some of the syntax inside, able to really focus on your cabbage cabbage used to pick out the negative situation.
second how do you find be put to, were the first to know the word, we have another piece of technology make up for this vacancy, through our network of the above study found that language change is very fast, often there are some new words come up, sometimes I some of the new forum, I found that I totally do not understand what people say there may be some time to look familiar, with new words found in the inside, which can occur every month of the new words on the network can help you mark out, and then eventually you have to judge what people, the word in the end is a product of the nickname, the nickname of a brand is an emotional word, especially words a bit more complicated emotion. On the other hand, you also used a word not from the context, the same word for different attributes and products may not be what it means, we put all the things that a sum to the semantic analysis, the key is that what is the meaning behind the word.
guests questions: I would like to ask you the time of analysis, the data may be the main content is the main post, there is a problem, for example, I drink the cup of tea, I said my attitude to it, you can filter through the machine main content more important is the following people to its attitude, this may be associated with the word is delicious, do not drink, more importantly, the following information is their content. so how do capture and analyze data?
Yumin Wei: There are two means of crawling process, the first is not grasping the first or the first post posted Replies were arrested or, if you can catch Replies so, this information can analyze it, because in your system inside, first post and Replies are there in your database, you know the structure. this issue based on our previous analysis about the natural sources in the data collection above, we look for if you later data analysis, technical evaluation to you when you ask a question If say you are on the BBS, it is not in the grasp Replies based on the first post, as far as I understand, in fact, find many of the analysis for the BBS, are directed at the first post or the first page inside the post, there are some The relatively high noise floor, Replies may be many pages, there is also a very large value, this one in the system as long as there Replies can, in fact, the analysis you just said the goal is achievable.
guests questions: inside the cover for the micro-blog network, you are now able to effectively capture the degree of information is what?
Yumin Wei: micro-blog this one, although it is just the beginning, in fact, we have included in our internal systems within this one analysis, from the data on the timeliness or quality, in fact, very good quality micro-Bo, a complete, because there are some micro-Bo APR above the things that you can get the complete content, we can see who posted the IP, can also see the exact post time.
timeliness of the above, because the amount of data updates microblogging very fast, if you simply for microblogging to analyze this thing, I believe that if looking for a technical expert to develop a procedure is can do, sometimes we need to provide a comprehensive analysis, you're not just analysis to it, and you may see in this matter is the attitude of the above micro-Bo what, BBC's attitude is what you want to balance between timeliness. We can do one day here is probably the fastest update frequency of one day. But if you only for micro-Bo, in fact, each piece can be done, because the technology completely opened up the channels above.
I know there are many foreign countries from time to time as data analysis, almost every minute of it even ...
No comments:
Post a Comment