A Survey on Self Destructing Data

A Survey on Self Destructing Data:Solution for Privacy Risks in OSNs
Reshmi T S
Department of Information and Technology, Bannari Amman Institute of Technology, Sathyamangalam, India 638 401

Abstract: Online Social Networks(OSN) plays a vital role in our day to day life. The most popular social network, Facebook alone counts currently 2.23 billion users worldwide. Many online social network users are aware of the various security risks that exists in these networks including privacy violations and they are utilising the privacy settings provided by OSN providers to make their data safe. But most of them are unaware on the risk which is existing after deletion of their data which is not actually getting deleted from the OSN server. Self destruction of data is one of the prime recommended method to achieve assured deletion of data. Numerous techniques have been developed for self destruction of data and the purpose of this paper is to discuss and evaluate on these techniques along with the various privacy risks faced by an OSN user in this web centred world.

With the rapid increase in network speed from 3G to 4G Online Social Networks(OSN) are getting higher demand not only among techies, more than that among common people. Social networks are providing an easy platform for users to interact each other directly or indirectly through posts, chats, share etc. Facebook Twitter and Instagram are prominent among them.
In this scenario individual privacy is of utmost importance. Most of the social network platforms provide privacy policies and provisions for users to select which mode of privacy they need,ie. Which type of information about them can others read or see or share. Eg: MySpace website allows privacy setting for people under 18 years to make their profile content available only for their friends and others who are less than 18 years old6. Facebook users can set their personal privacy and security setting according to their preferences. Beyond all these services most social network platforms are unable to protect user privacy completely.

2.Online Privacy Risks

a)Surveillance Problem: In OSNs the personal informations and social interactions of users are utilised by government and service providers7. For example, suppose a user of Facebook uploads a group photo and some of his/her friends got tagged in that. If some of these friends used to protest against government who are not still recognised, this photo in which they had tagged can be used by the government authorities to find them. These type of acts may limit free speech and decrease the power of relationship between user and OSN.
b)Social Privacy:Social privacy depends on concerns that users raise and to the harms that they experience when OSNs violate the social boundaries7. It has resulted in many issues like: damaged reputation, interpersonal conflicts, presentation anxiety, unwanted contacts, context collision, stalking, peer pressure, blackmailing etc. There are OSN privacy settings in which users can set their boundaries. But there are problems associated with this privacy settings. A variety of decision making problems reappear when users utilise their OSN privacy settings. Users may fail to predict their future preferences. Most of them compromise for the present. They won’t consider visibility of the settings what they had made. So if a user is making a privacy setting they should be able to view a precise nature of the consequences of their settings.
c)Institutional Privacy:The users losing control and oversight over the collection and processing of their information in OSNs is known as institutional privacy7. It also refers to the usage of user’s data by the service providers or host. The existing service providers are not giving an option to the users whether to use their data or not. They just include it in their terms and conditions which are not seen by most of the users7. The service providers store, use and sell the user’s data to third parties and advertisers which makes the user data vulnerable. So there is lack of privacy to the users.
d)Longitudinal Privacy:User privacy management is more challenging with the passage of time, which is known as longitudinal privacy5. User privacy preferences changes over time. There are many reasons for this change: relevance of shared content changes over time, biographical status change, friendship relations change and so on. Even after a user withdraws his/her public posts the past interactions of the user with other users leave a trace of residual posts that remain on the site. These residual activities are sufficient for outsiders to recover significant amount of information about the user who had withdrawn the post. This type of privacy concern is more evident in Facebook. In Facebook even a user delete a post any outsider who had copied the URL of the post can access it even after it has been deleted by the user who had posted it.
e)Deletion Delay:After deleting a post in a social network platform it can be still accessed by an outsider using it’s URL. This is called as deletion delay6. If this delay is of few seconds or minutes it is not a prohibitive problem. But in most of the OSN sites deletion delay time is 30 days. The same happens when a user deletes an OSN account. Facebook privacy policy states that it takes 90 days for them to completely erase the contents from their server. But according to the user view since the account is deleted there won’t be any trace of data in the OSN servers. So here the common user is mislead. The same problem exists when cross links are considered6. If a user posts a photo in Instagram notification will be there in facebook. Even if the user deletes the photo from instagram it is still visible in facebook as a small icon. So deletion delay is one of the major privacy issue that OSN users face today.

3.Why Deletion Delay?
There are few reasons for this delay in deletion of an user data. They are mainly:
a)Technical:Most sites runs on clusters of machines, some of which serve web pages some of them holds data. In order to give a site global reach and rapid response time, multiple copies of the data are often made and may have saved in different regions. So it takes time to search it all to delete the stuff6.
b)Business:Although the account is closed, data about the users is still valuable to advertisers or others6. They hope the users will be back and if they return the users can start from where they had stopped.
c)Legal:OSN providers are obliged to some legal laws6 that they have to keep some data about a user so that whenever a legal search arise they should be able to produce information about the user.

4.Self Destructing Data
With the development of OSNs and popularization of internet OSN services are becoming more important for human life. When people post or share some personal private information they hope service providers will provide security to protect their data from leaking, so others will not invade their privacy. As people trust more on internet and OSN their privacy takes more risks. People are not aware on how their data is processed, transformed and stored. Their privacy can be leaked via service provider’s negligence, hackers intrusion or some legal action. These problems mainly arise during deletion delay time. During deletion delay time, according to user view their data is deleted and they won’t check into it again. So it is easy for the intruders to make an attack at this time.
Self destructing of data4 and their copies aims at protecting the privacy of user data. All the data and their copies become destructed or unreadable after a user specified time. In this web centred world self destructing data is broadly applicable as user’s sensitive data can persist in the OSN server indefinitely. With self destructing data users can control the life time of their private messages on Facebook or private photos on Flickr4.

5.Literature Survey

Vanish11 is centralised over Distributed Hash Table(DHT). Vanish stems out from three unique properties of DHT.
Huge scale geographical distribution of nodes across many countries.
Reliable distributed storage
The nodes gets changed constantly after a fixed interval of time (8 hours)
Vanish encrypts the user data and break the key used for encryption into n- shares using Shamir- secret key algorithm. After dividing into n shares they are distributed among the nodes of DHT. After a fixed time interval each node of the DHT expires so the key share gets destroyed. Without key the data cannot be retrieved later.
The major disadvantage of this system is that, the key is getting destroyed but the ciphertext is still available. So by cryptanalysis or brute force attack it is possible to retrieve the data from ciphertext.

2) Safe Vanish:
Safe Vanish10 is an extended version of Vanish. Similar to vanish here also the message is encrypted with a key and the key is divided into shares using Shamir secret key sharing method. But here the length range of key shares is extended. The larger the threshold is, the more key shares the attacker needs to capture and the ciphertext is more secure. Apart from this in Safe vanish the key itself is encrypted using RSA algorithm before it is pushed to the DHT. In this method it improves the data security by making it difficult for attackers to get the key. But still the deletion of data depends on the expiry time of nodes of DHT and only the key is getting destroyed, ciphertext is still there which is a disadvantage.

3) FADE:
In FADE(File Assured Deletion)9 each file uploaded is associated with a single file access policy. Each policy is associated with a control key and all the control keys are stored in the key manager. An uploaded file is encrypted with a data key and each data key is encrypted with the control key associated with the file policy. When a policy is revoked the corresponding control key is removed from the key manager and the data key and the encrypted file cannot be recovered later. At this stage FADE says the file is deleted or cannot be recovered. There are several disadvantages for this system: a) after deletion cipher text is available which can be attacked by an intruder b) removal of keys of revoked policies and maintaining the storage of keys of active policies are all based on assumptions.

In SeDas(Self Destructing Data System)8 self destruct method contain a secret key part and a survival time parameter for each secret key part. Here initially the data uploaded is encrypted with a key. This key is divided into n shares using Shamir secret shares algorithm. With each key share certain amount of data is allocated. Here key is accompanied with a TTL(Time To Live) value. This TTL value is responsible for the self destructing property of SeDas. TTL value is assigned by the data owner at the time of uploading data. For each key share TTL value is assigned and once the TTL value expires the key is destroyed. Since key is destroyed it not possible for other users to retrieve the data. The disadvantage of the system is after expiry of TTL the key is only getting destroyed ciphertext is still available for intruders.

5) Self data destruction:
In self data destruction, at the time of document upload user will specify the life span for document storage4. System will continuously check whether the life span of any document expires or not. As soon as the lifespan is expired the document is deleted from the server. Here data destruction algorithm4 is used. The disadvantage of the system is that once the user specifies the lifetime there is no provision to update it again.

6) In Gmail:
Recently gmail has added a new security mode which lets user to sent messages with an expiration date2. Incoming messages look like typical emails, but when the expiration time reaches the body text disappears. If the contact lies in a different email service, then they won’t be able to see the message normally. Instead gmail will send them an email containing a link and a separate email or text message with a passcode. They can follow the link and enter the passcode to unlock the message, but this link will stop working after the expiration date. Telegram messenger, Snapchat, Facebook messenger, Confide also use the self destruction policy to enhance privacy2. The expiry time once specified by the user cannot be changed, which is a disadvantage.

7)Retrograde Storage:
Retrograde1 is a method which erases the expired messages completely and makes them unrecoverable by using data recovery utilities. Here the messages which needs to be erased after reading are stored in a recycle pool which is waiting to be overwritten by new data. Retrograde storage model is based on frequently colliding hash table.
Hash table is a structure that maps keywords to values. Hash function will assign each keyword to a unique bucket in it. If two keywords are assigned to the same bucket collision will occur. This colliding property of hash table is utilised in retrograde storage. To make the hash table frequently colliding the number of buckets to which keywords are mapped is reduced. For example: In traditional hash tables 2,450 keywords are assigned to 1,000,000 buckets even with perfectly random distribution there is a probability of 95% to occur 1 collision1. In retrograde storage method the same 2,450 keywords are assigned to 100 buckets so that the probability of collision occurring in a bucket is 100%1. As the collision count increases the count of overwriting is also increased. Retrograde method uses multipass Gutmann method3 for making the message unrecoverable.
As Gutmann method is time consuming1 Retrograde storage method also consumes time. This is the major disadvantage of the system.

6. Conclusion
The content of the paper reviews several methods used for self destructing data, which is a solution for assured deletion of data in Online Social Networks. Self destruction methods Vanish, Safevanish, SeDas depends on DHT(Distributed Hash Table) and the key used for encryption is getting destroyed which in turn makes the data unreadable. In gmail and self data destruction the user specifies the expiry time of a message during the uploading time. FADE is policy based data destruction method. Retrograde method is based on frequently colliding hash tables. From the survey pros and cons of different self destruction methods are observed. Overall studies helps in future to design a method for data self destr